Open AI Atari Games. Reinforcement Learning with PyTorch, deep Learning

By Nasrudin Bin Salim

Requirements: Python 2.7. Linux Environment/UNIX Environment

Please Install Pytorch
OpenAI Gym
Open AI Universe
cv2

Imports


In [1]:
from __future__ import print_function, division
import numpy as np

import json
import logging
from cv2 import resize
from skimage.color import rgb2gray
import os
os.environ["OMP_NUM_THREADS"] = "1" #should be set to 1 to prevent conflicts
import time



from torch.autograd import Variable

Import OpenAI Universe environment and gym

Import Pytorch for model


In [2]:
import gym
from universe import vectorized
from universe.wrappers import Unvectorize, Vectorize

from gym.spaces.box import Box
from gym.configuration import undo_logger_setup

import torch
from torch.multiprocessing import Process
import torch.nn.functional as F


import torch.optim as optim

Helper Functions Section

Logging function


In [3]:
def setup_logger(logger_name, log_file, level=logging.INFO):
    
    ''' Makes use of the logging module'''
    #Instantiates the logging class
    l = logging.getLogger(logger_name)
    
    #Formatter
    formatter = logging.Formatter('%(asctime)s : %(message)s')
    
    #file handler    
    fileHandler = logging.FileHandler(log_file, mode='w')
    fileHandler.setFormatter(formatter)
    
    #streamhandler
    streamHandler = logging.StreamHandler()
    streamHandler.setFormatter(formatter)
    
    #add the above handles to the logger instance
    l.setLevel(level)
    l.addHandler(fileHandler)
    l.addHandler(streamHandler)

Read Json Object


In [4]:
def read_config(file_path):
    """Read JSON config."""
    #use the context manager
    with open(file_path, 'r') as f:
        json_object = json.load(f)
        
    return json_object

Share grads between 2 models

More on this later


In [5]:
def ensure_shared_grads(model, shared_model):
    for param, shared_param in zip(model.parameters(),
                                   shared_model.parameters()):
        if shared_param.grad is not None:
            return
        shared_param._grad = param.grad

Environment, setting up the openAI and Universe

Create the atari environment function


In [6]:
def atari_env(env_id, env_conf):
    env = gym.make(env_id)
    if len(env.observation_space.shape) > 1:
        env = Vectorize(env)
        env = AtariRescale(env, env_conf)
        env = NormalizedEnv(env)
        env = Unvectorize(env)
        
    return env

Create a frame for environment


In [7]:
def _process_frame(frame, conf):
    frame = frame[conf["crop1"]:conf["crop2"] + 160, :160]
    frame = resize(rgb2gray(frame), (80, conf["dimension2"]))
    frame = resize(frame, (80, 80))
    frame = np.reshape(frame, [1, 80, 80])
    return frame

Atari rescale class


In [8]:
class AtariRescale(vectorized.ObservationWrapper):
    def __init__(self, env, env_conf):
        super(AtariRescale, self).__init__(env)
        self.observation_space = Box(0.0, 1.0, [1, 80, 80])
        self.conf = env_conf

    def _observation(self, observation_n):
        return [
            _process_frame(observation, self.conf)
            for observation in observation_n
        ]

Normalized environment class, where we can move from one state and observation to another


In [9]:
class NormalizedEnv(vectorized.ObservationWrapper):
    def __init__(self, env=None):
        super(NormalizedEnv, self).__init__(env)
        self.state_mean = 0
        self.state_std = 0
        self.alpha = 0.9999
        self.num_steps = 0

    def _observation(self, observation_n):
        for observation in observation_n:
            self.num_steps += 1
            self.state_mean = self.state_mean * self.alpha + \
                observation.mean() * (1 - self.alpha)
            self.state_std = self.state_std * self.alpha + \
                observation.std() * (1 - self.alpha)

        unbiased_mean = self.state_mean / (1 - pow(self.alpha, self.num_steps))
        unbiased_std = self.state_std / (1 - pow(self.alpha, self.num_steps))

        return [(observation - unbiased_mean) / (unbiased_std + 1e-8)
                for observation in observation_n]

Model

Using Google DeepMind's Idea.

Research Paper: https://arxiv.org/pdf/1602.01783.pdf
Asynchronous Advantage Actor-Critic (A3C)


The A3C algorithm was released by Google’s DeepMind group earlier this year, and it made a splash by… essentially obsoleting DQN. It was faster, simpler, more robust, and able to achieve much better scores on the standard battery of Deep RL tasks. On top of all that it could work in continuous as well as discrete action spaces. Given this, it has become the go-to Deep RL algorithm for new challenging problems with complex state and action spaces

Medium Article explaining A3c reinforcement learning

The Actor-Critic Structure

Many workers training and learning concurrently, and then updates global network with gradients

Process Flow

Long Short Term Memory Recurrent Neural Nets

Implementing LSTM and A3C with Pytorch

Created as a module and then imported


In [10]:
from A3CModel import A3Clstm

The player Agent

(Reinforcement Learning agent to interact with the env)


In [11]:
class Agent(object):
    def __init__(self, model, env, args, state):
        self.model = model
        self.env = env
        self.current_life = 0
        self.state = state
        self.hx = None
        self.cx = None
        self.eps_len = 0
        self.args = args
        self.values = []
        self.log_probs = []
        self.rewards = []
        self.entropies = []
        self.done = True
        self.info = None
        self.reward = 0

    def action_train(self):
        if self.done:
            self.cx = Variable(torch.zeros(1, 512))
            self.hx = Variable(torch.zeros(1, 512))
        else:
            self.cx = Variable(self.cx.data)
            self.hx = Variable(self.hx.data)
        value, logit, (self.hx, self.cx) = self.model((Variable(self.state.unsqueeze(0)), (self.hx, self.cx)))
        prob = F.softmax(logit)
        log_prob = F.log_softmax(logit)
        entropy = -(log_prob * prob).sum(1)
        self.entropies.append(entropy)
        action = prob.multinomial().data
        log_prob = log_prob.gather(1, Variable(action))
        state, self.reward, self.done, self.info = self.env.step(action.numpy())
        self.state = torch.from_numpy(state).float()
        self.eps_len += 1
        self.done = self.done or self.eps_len >= self.args['M']
        self.reward = max(min(self.reward, 1), -1)
        self.values.append(value)
        self.log_probs.append(log_prob)
        self.rewards.append(self.reward)
        return self

    def action_test(self):
        if self.done:
            self.cx = Variable(torch.zeros(1, 512), volatile=True)
            self.hx = Variable(torch.zeros(1, 512), volatile=True)
        else:
            self.cx = Variable(self.cx.data, volatile=True)
            self.hx = Variable(self.hx.data, volatile=True)
        value, logit, (self.hx, self.cx) = self.model((Variable(self.state.unsqueeze(0), volatile=True), (self.hx, self.cx)))
        prob = F.softmax(logit)
        action = prob.max(1)[1].data.numpy()
        state, self.reward, self.done, self.info = self.env.step(action[0])
        self.state = torch.from_numpy(state).float()
        self.eps_len += 1
        self.done = self.done or self.eps_len >= self.args['M']
        return self

    def check_state(self):
        if self.current_life > self.info['ale.lives']:
            self.done = True
        self.current_life = self.info['ale.lives']
        return self

    def clear_actions(self):
        self.values = []
        self.log_probs = []
        self.rewards = []
        self.entropies = []
        return self

Shared Memory and optimization algorithims

As Part of the A3C Network, multiple workers will be working together to update a global network

RMSprop

RMSprop is an unpublished, adaptive learning rate method proposed by Geoff Hinton in Lecture 6e of his Coursera Class.

RMSprop and Adadelta have both been developed independently around the same time stemming from the need to resolve Adagrad's radically diminishing learning rates. RMSprop in fact is identical to the first update vector of Adadelta

RMSprop as well divides the learning rate by an exponentially decaying average of squared gradients. Hinton suggests γ to be set to 0.9, while a good default value for the learning rate η is 0.001.


In [12]:
from SharedOptimizers import SharedRMSprop

Adaptive Moment Estimation (Adam)

is another method that computes adaptive learning rates for each parameter. In addition to storing an exponentially decaying average of past squared gradients vt like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients mt, similar to momentum:

Adam (short for Adaptive Moment Estimation) is an update to the RMSProp optimizer. In this optimization algorithm, running averages of both the gradients and the second moments of the gradients are used.


In [13]:
from SharedOptimizers import SharedAdam

Adam but only with shared Lr


In [14]:
from SharedOptimizers import SharedLrSchedAdam

Functions to run the model on the environment

Test

Function To test the model on a game/environ


In [15]:
def test(args, shared_model, env_conf,render=False):
    log = {}
    setup_logger('{}_log'.format(args['ENV']),
                 r'{0}{1}_log'.format(args['LG'], args['ENV']))
    log['{}_log'.format(args['ENV'])] = logging.getLogger(
        '{}_log'.format(args['ENV']))
    d_args = args
    for k in d_args.keys():
        log['{}_log'.format(args['ENV'])].info('{0}: {1}'.format(k, d_args[k]))

    torch.manual_seed(args['seed'])
    env = atari_env(args['ENV'], env_conf)
    reward_sum = 0
    start_time = time.time()
    num_tests = 0
    reward_total_sum = 0
    player = Agent(None, env, args, None)
    player.model = A3Clstm(
        player.env.observation_space.shape[0], player.env.action_space)
    player.state = player.env.reset()
    player.state = torch.from_numpy(player.state).float()
    player.model.eval()

    while True:
        if player.done:
            player.model.load_state_dict(shared_model.state_dict())
        if render:
            env.render()
        player.action_test()
        reward_sum += player.reward

        if player.done:
            num_tests += 1
            player.current_life = 0
            reward_total_sum += reward_sum
            reward_mean = reward_total_sum / num_tests
            log['{}_log'.format(args['ENV'])].info(
                "Time {0}, episode reward {1}, episode length {2}, reward mean {3:.4f}".
                format(
                    time.strftime("%Hh %Mm %Ss",
                                  time.gmtime(time.time() - start_time)),
                    reward_sum, player.eps_len, reward_mean))

            if reward_sum > args['SSL']:
                player.model.load_state_dict(shared_model.state_dict())
                state_to_save = player.model.state_dict()
                torch.save(state_to_save, '{0}{1}.dat'.format(
                    args['SMD'], args['ENV']))

            reward_sum = 0
            player.eps_len = 0
            state = player.env.reset()
            time.sleep(60)
            player.state = torch.from_numpy(state).float()

Train

Function to Train the model with an optimizer algorithim on an environment


In [16]:
def train(rank, args, shared_model, optimizer, env_conf):

    torch.manual_seed(args['seed'] + rank)
    env = atari_env(args['ENV'], env_conf)
    if optimizer is None:
        if args['OPT'] == 'RMSprop':
            optimizer = optim.RMSprop(shared_model.parameters(), lr=args['LR'])
        if args['OPT'] == 'Adam':
            optimizer = optim.Adam(shared_model.parameters(), lr=args['LR'])

    env.seed(args['seed'] + rank)
    player = Agent(None, env, args, None)
    player.model = A3Clstm(
        player.env.observation_space.shape[0], player.env.action_space)
    player.state = player.env.reset()
    player.state = torch.from_numpy(player.state).float()
    player.model.train()

    while True:
        player.model.load_state_dict(shared_model.state_dict())
        for step in range(args['NS']):
            player.action_train()
            if args['CL']:
                player.check_state()
            if player.done:
                break

        if player.done:
            player.eps_len = 0
            player.current_life = 0
            state = player.env.reset()
            player.state = torch.from_numpy(state).float()

        R = torch.zeros(1, 1)
        if not player.done:
            value, _, _ = player.model(
                (Variable(player.state.unsqueeze(0)), (player.hx, player.cx)))
            R = value.data

        player.values.append(Variable(R))
        policy_loss = 0
        value_loss = 0
        R = Variable(R)
        gae = torch.zeros(1, 1)
        for i in reversed(range(len(player.rewards))):
            R = args['G'] * R + player.rewards[i]
            advantage = R - player.values[i]
            value_loss = value_loss + 0.5 * advantage.pow(2)

            # Generalized Advantage Estimataion
            delta_t = player.rewards[i] + args['G'] * \
                player.values[i + 1].data - player.values[i].data
            gae = gae * args['G'] * args['T'] + delta_t

            policy_loss = policy_loss - \
                player.log_probs[i] * \
                Variable(gae) - 0.01 * player.entropies[i]

        optimizer.zero_grad()
        (policy_loss + 0.5 * value_loss).backward()
        torch.nn.utils.clip_grad_norm(player.model.parameters(), 40)
        ensure_shared_grads(player.model, shared_model)
        optimizer.step()
        player.clear_actions()

Putting it altogether

List of Games, pick one here and then edit the environment accordingly

Choose an Atari game and it has to be a 4D Tensor Game 
Or if you don't know what that means, just guess and check

In [17]:
gym.envs.registry.all()


Out[17]:
[EnvSpec(flashgames.UrbanMicroRacers-v0),
 EnvSpec(flashgames.PlopPlopLite-v0),
 EnvSpec(DoubleDunk-ramDeterministic-v4),
 EnvSpec(flashgames.Sieger2LevelPack-v0),
 EnvSpec(DoubleDunk-ramDeterministic-v0),
 EnvSpec(gym-core.Krull-v0),
 EnvSpec(gym-core.Krull-v3),
 EnvSpec(Pooyan-ram-v4),
 EnvSpec(Pooyan-ram-v0),
 EnvSpec(flashgames.GonAndMon-v0),
 EnvSpec(flashgames.NeonRaceLvl4-v0),
 EnvSpec(flashgames.Hash-v0),
 EnvSpec(gym-core.JourneyEscapeSlow-v3),
 EnvSpec(gym-core.JourneyEscapeSlow-v0),
 EnvSpec(flashgames.FlashBombs-v0),
 EnvSpec(gym-core.JamesbondDeterministicSlow-v0),
 EnvSpec(VentureNoFrameskip-v0),
 EnvSpec(Centipede-v0),
 EnvSpec(Centipede-v4),
 EnvSpec(flashgames.Crumbs2-v0),
 EnvSpec(flashgames.CosmoGravity2-v0),
 EnvSpec(gym-core.Zaxxon30FPS-v0),
 EnvSpec(gym-core.Zaxxon30FPS-v3),
 EnvSpec(flashgames.ZombiesAndDonuts-v0),
 EnvSpec(Frostbite-ramNoFrameskip-v0),
 EnvSpec(Frostbite-ramNoFrameskip-v4),
 EnvSpec(IceHockey-ramNoFrameskip-v4),
 EnvSpec(flashgames.SpacePunkRacerLvl8-v0),
 EnvSpec(flashgames.EvolutionRacingLvl8-v0),
 EnvSpec(flashgames.GalacticGems2NewFrontiers-v0),
 EnvSpec(flashgames.DisasterWillStrikeDefender-v0),
 EnvSpec(flashgames.BikeTrial2-v0),
 EnvSpec(gym-core.SolarisDeterministic-v3),
 EnvSpec(gym-core.SolarisDeterministic-v0),
 EnvSpec(AirRaidNoFrameskip-v4),
 EnvSpec(gym-core.BerzerkSlow-v3),
 EnvSpec(AirRaidNoFrameskip-v0),
 EnvSpec(gym-core.BankHeistSync-v3),
 EnvSpec(flashgames.NinjaTrainingWorlds-v0),
 EnvSpec(flashgames.3dFlashRacer-v0),
 EnvSpec(KrullDeterministic-v0),
 EnvSpec(SemisuperPendulumRandom-v0),
 EnvSpec(KrullDeterministic-v4),
 EnvSpec(wob.real.Quizlet-Planet-Test-v0),
 EnvSpec(Go19x19-v0),
 EnvSpec(gym-core.CrazyClimber30FPS-v0),
 EnvSpec(flashgames.GravityBall-v0),
 EnvSpec(gym-core.CrazyClimber30FPS-v3),
 EnvSpec(gym-core.ChopperCommandDeterministicSlow-v0),
 EnvSpec(gym-core.ChopperCommandDeterministicSlow-v3),
 EnvSpec(gym-core.CentipedeDeterministicSlow-v0),
 EnvSpec(gym-core.Zaxxon-v0),
 EnvSpec(gym-core.CentipedeDeterministicSlow-v3),
 EnvSpec(FishingDerbyNoFrameskip-v4),
 EnvSpec(FishingDerbyNoFrameskip-v0),
 EnvSpec(flashgames.FormulaXspeed3d-v0),
 EnvSpec(flashgames.BobbyNutcaseMotoJumping-v0),
 EnvSpec(flashgames.RollerRider-v0),
 EnvSpec(gym-core.YarsRevengeNoFrameskip-v0),
 EnvSpec(gym-core.Assault30FPS-v0),
 EnvSpec(gym-core.Assault30FPS-v3),
 EnvSpec(Qbert-v4),
 EnvSpec(Qbert-v0),
 EnvSpec(flashgames.BusinessmanSimulator-v0),
 EnvSpec(gym-core.MontezumaRevenge30FPS-v3),
 EnvSpec(flashgames.PunchBallJump-v0),
 EnvSpec(Robotank-ram-v4),
 EnvSpec(flashgames.SpectrumRunner-v0),
 EnvSpec(flashgames.LooneyAndJohny-v0),
 EnvSpec(gym-core.TennisDeterministicSync-v3),
 EnvSpec(flashgames.TheThreeTowers-v0),
 EnvSpec(gym-core.TennisDeterministicSync-v0),
 EnvSpec(flashgames.DriveToWreck-v0),
 EnvSpec(flashgames.SkiSim-v0),
 EnvSpec(MontezumaRevenge-ramNoFrameskip-v4),
 EnvSpec(MontezumaRevenge-ramNoFrameskip-v0),
 EnvSpec(gym-core.BerzerkDeterministicSync-v0),
 EnvSpec(flashgames.FlashRacer-v0),
 EnvSpec(flashgames.Sundrops-v0),
 EnvSpec(gym-core.TimePilot30FPS-v0),
 EnvSpec(gym-core.TimePilot30FPS-v3),
 EnvSpec(flashgames.Rocketeer-v0),
 EnvSpec(Go9x9-v0),
 EnvSpec(flashgames.MineDrop-v0),
 EnvSpec(flashgames.RollingHills-v0),
 EnvSpec(flashgames.DrawGems-v0),
 EnvSpec(MountainCarContinuous-v0),
 EnvSpec(gym-core.CrazyClimberDeterministicSync-v3),
 EnvSpec(gym-core.CrazyClimberDeterministicSync-v0),
 EnvSpec(gym-core.EnduroSync-v0),
 EnvSpec(flashgames.WastelandSiege-v0),
 EnvSpec(gym-core.EnduroSync-v3),
 EnvSpec(gym-core.Zaxxon-v3),
 EnvSpec(Pong-v4),
 EnvSpec(flashgames.PapaLouie3WhenSundaesAttack-v0),
 EnvSpec(Pong-v0),
 EnvSpec(flashgames.StormRage-v0),
 EnvSpec(flashgames.GalleonFight-v0),
 EnvSpec(gym-core.BattleZone30FPS-v0),
 EnvSpec(gym-core.BattleZone30FPS-v3),
 EnvSpec(flashgames.GalacticGems2LevelPack-v0),
 EnvSpec(flashgames.MagicSafari-v0),
 EnvSpec(gym-core.CentipedeDeterministicSync-v3),
 EnvSpec(gym-core.CentipedeDeterministicSync-v0),
 EnvSpec(gym-core.UpNDownSync-v0),
 EnvSpec(gym-core.UpNDownSync-v3),
 EnvSpec(flashgames.RhythmSnake-v0),
 EnvSpec(flashgames.SuperK9-v0),
 EnvSpec(gym-core.KrullNoFrameskip-v3),
 EnvSpec(flashgames.AmericanRacingLvl2-v0),
 EnvSpec(flashgames.GroundBattles-v0),
 EnvSpec(BattleZoneDeterministic-v4),
 EnvSpec(BattleZoneDeterministic-v0),
 EnvSpec(gym-core.Asterix-v0),
 EnvSpec(gym-core.Asterix-v3),
 EnvSpec(flashgames.AmericanRacingLvl19-v0),
 EnvSpec(flashgames.MysteriousPirateJewels-v0),
 EnvSpec(flashgames.GemPop-v0),
 EnvSpec(gym-core.AirRaid-v3),
 EnvSpec(flashgames.JumpOverTheRings-v0),
 EnvSpec(gym-core.SeaquestDeterministic-v3),
 EnvSpec(gym-core.SeaquestDeterministic-v0),
 EnvSpec(ConvergenceControl-v0),
 EnvSpec(flashgames.MonkeyManic-v0),
 EnvSpec(flashgames.SurvivalLab-v0),
 EnvSpec(flashgames.MeerkatMission-v0),
 EnvSpec(gym-core.AlienSync-v0),
 EnvSpec(gym-core.AlienSync-v3),
 EnvSpec(gym-core.SkiingSlow-v3),
 EnvSpec(gym-core.SkiingSlow-v0),
 EnvSpec(BattleZoneNoFrameskip-v0),
 EnvSpec(gym-core.SpaceInvaders30FPS-v3),
 EnvSpec(flashgames.BulletHeaven-v0),
 EnvSpec(BattleZoneNoFrameskip-v4),
 EnvSpec(gym-core.DemonAttackNoFrameskip-v3),
 EnvSpec(gym-core.DemonAttackNoFrameskip-v0),
 EnvSpec(flashgames.MexicoRex-v0),
 EnvSpec(gym-core.CentipedeSlow-v0),
 EnvSpec(gym-core.CentipedeSlow-v3),
 EnvSpec(flashgames.Pyro-v0),
 EnvSpec(TutankhamDeterministic-v0),
 EnvSpec(UpNDown-ramDeterministic-v4),
 EnvSpec(TutankhamDeterministic-v4),
 EnvSpec(UpNDown-ramDeterministic-v0),
 EnvSpec(gym-core.AirRaidDeterministic-v0),
 EnvSpec(gym-core.AirRaidDeterministic-v3),
 EnvSpec(flashgames.ShimmyChute-v0),
 EnvSpec(Jamesbond-ramNoFrameskip-v4),
 EnvSpec(flashgames.SuperbikeRacer-v0),
 EnvSpec(Jamesbond-ramNoFrameskip-v0),
 EnvSpec(gym-core.DoubleDunkDeterministic-v3),
 EnvSpec(gym-core.DoubleDunkDeterministic-v0),
 EnvSpec(flashgames.Krome-v0),
 EnvSpec(gym-core.KrullSlow-v0),
 EnvSpec(gym-core.KrullSlow-v3),
 EnvSpec(wob.mini.ScrollText2-v0),
 EnvSpec(gym-core.IceHockey-v3),
 EnvSpec(gym-core.IceHockey-v0),
 EnvSpec(gym-core.AsterixDeterministic-v3),
 EnvSpec(gym-core.AsterixDeterministic-v0),
 EnvSpec(flashgames.KeeperOfTheGrove3-v0),
 EnvSpec(YarsRevengeDeterministic-v0),
 EnvSpec(YarsRevengeDeterministic-v4),
 EnvSpec(gym-core.StarGunnerDeterministicSlow-v3),
 EnvSpec(gym-core.Tutankham-v3),
 EnvSpec(gym-core.Tutankham-v0),
 EnvSpec(flashgames.CatchTheStar-v0),
 EnvSpec(Bowling-ramNoFrameskip-v0),
 EnvSpec(Bowling-ramNoFrameskip-v4),
 EnvSpec(TwoRoundNondeterministicReward-v0),
 EnvSpec(flashgames.FlashRace-v0),
 EnvSpec(flashgames.TattooArtist-v0),
 EnvSpec(flashgames.FormulaRacer2012Lvl8-v0),
 EnvSpec(flashgames.DumperRush-v0),
 EnvSpec(flashgames.LaserCannon3LevelsPack-v0),
 EnvSpec(flashgames.SlipSlideSloth-v0),
 EnvSpec(flashgames.PaulVaulting-v0),
 EnvSpec(flashgames.CoasterRacer2Lvl2-v0),
 EnvSpec(flashgames.DigToChina-v0),
 EnvSpec(gym-core.Bowling-v0),
 EnvSpec(flashgames.ZombieTdReborn-v0),
 EnvSpec(CrazyClimber-ram-v0),
 EnvSpec(gym-core.KungFuMaster30FPS-v0),
 EnvSpec(CrazyClimber-ram-v4),
 EnvSpec(flashgames.WoollyBearJigsawPuzzle-v0),
 EnvSpec(gym-core.Pooyan30FPS-v3),
 EnvSpec(RoadRunner-ramNoFrameskip-v4),
 EnvSpec(RoadRunner-ramNoFrameskip-v0),
 EnvSpec(flashgames.SuperRallyExtreme-v0),
 EnvSpec(Jamesbond-ramDeterministic-v4),
 EnvSpec(gym-core.Pooyan-v3),
 EnvSpec(BreakoutDeterministic-v4),
 EnvSpec(gym-core.Pooyan30FPS-v0),
 EnvSpec(Jamesbond-ramDeterministic-v0),
 EnvSpec(flashgames.AmericanRacingLvl14-v0),
 EnvSpec(BreakoutDeterministic-v0),
 EnvSpec(flashgames.CoasterRacer2Lvl3-v0),
 EnvSpec(flashgames.HeatRushFutureLvl14-v0),
 EnvSpec(flashgames.HeatRushFutureLvl15-v0),
 EnvSpec(flashgames.BottleCaps-v0),
 EnvSpec(gym-core.MontezumaRevengeSlow-v3),
 EnvSpec(SeaquestNoFrameskip-v0),
 EnvSpec(SeaquestNoFrameskip-v4),
 EnvSpec(flashgames.LearnToFlyIdle-v0),
 EnvSpec(gym-core.Asteroids30FPS-v0),
 EnvSpec(flashgames.JollySwipeLevelPack-v0),
 EnvSpec(gym-core.Asteroids30FPS-v3),
 EnvSpec(flashgames.Kinetikz3-v0),
 EnvSpec(flashgames.PiratesAndCannons-v0),
 EnvSpec(wob.real.Signup-14-v0),
 EnvSpec(flashgames.TaxiInc-v0),
 EnvSpec(flashgames.TankStorm2-v0),
 EnvSpec(gym-core.AirRaidDeterministicSync-v3),
 EnvSpec(gym-core.TennisSlow-v3),
 EnvSpec(gym-core.TennisSlow-v0),
 EnvSpec(gym-core.AirRaidDeterministicSync-v0),
 EnvSpec(flashgames.JetpackJackride-v0),
 EnvSpec(AirRaid-ram-v0),
 EnvSpec(Carnival-ram-v0),
 EnvSpec(AirRaid-ram-v4),
 EnvSpec(gym-core.AsteroidsDeterministicSlow-v0),
 EnvSpec(gym-core.AsteroidsDeterministicSlow-v3),
 EnvSpec(Carnival-ram-v4),
 EnvSpec(flashgames.MatchStars-v0),
 EnvSpec(flashgames.GunpowderAndFeathers-v0),
 EnvSpec(gym-core.VideoPinball-v0),
 EnvSpec(BeamRiderDeterministic-v0),
 EnvSpec(flashgames.TheTowerman-v0),
 EnvSpec(BeamRiderDeterministic-v4),
 EnvSpec(flashgames.FormulaRacerLvl5-v0),
 EnvSpec(gym-core.Centipede-v0),
 EnvSpec(gym-core.Centipede-v3),
 EnvSpec(Phoenix-ramDeterministic-v0),
 EnvSpec(gym-core.KrullSync-v3),
 EnvSpec(gym-core.KrullSync-v0),
 EnvSpec(gym-core.VentureSync-v0),
 EnvSpec(Phoenix-ramDeterministic-v4),
 EnvSpec(flashgames.BubbleShooterChallenge-v0),
 EnvSpec(flashgames.BubbleBlubbs-v0),
 EnvSpec(gym-core.KungFuMasterSync-v0),
 EnvSpec(gym-core.KungFuMasterSync-v3),
 EnvSpec(gym-core.AmidarSync-v3),
 EnvSpec(Boxing-ramDeterministic-v4),
 EnvSpec(gym-core.AmidarSync-v0),
 EnvSpec(Boxing-ramDeterministic-v0),
 EnvSpec(flashgames.WaveLucha-v0),
 EnvSpec(flashgames.EvolutionRacingLvl11-v0),
 EnvSpec(gym-core.ChopperCommandSync-v0),
 EnvSpec(gym-core.ElevatorActionDeterministicSlow-v0),
 EnvSpec(gym-core.ElevatorActionDeterministicSlow-v3),
 EnvSpec(flashgames.SpaceMadness-v0),
 EnvSpec(flashgames.KingRolla-v0),
 EnvSpec(gym-core.BerzerkSlow-v0),
 EnvSpec(Thrower-v0),
 EnvSpec(flashgames.BikeTrial3-v0),
 EnvSpec(flashgames.ChickCannont-v0),
 EnvSpec(flashgames.BearInSuperActionAdventure-v0),
 EnvSpec(gym-core.VideoPinballSync-v3),
 EnvSpec(gym-core.VideoPinballSync-v0),
 EnvSpec(flashgames.Thaw-v0),
 EnvSpec(flashgames.NewSiberianSupercarsRacing-v0),
 EnvSpec(flashgames.MinedigJourneyToHollowEarth-v0),
 EnvSpec(gym-core.BankHeistSync-v0),
 EnvSpec(gym-core.SpaceInvaders-v3),
 EnvSpec(flashgames.NeonRace2Lvl9-v0),
 EnvSpec(gym-core.NameThisGameDeterministicSync-v3),
 EnvSpec(gym-core.NameThisGameDeterministicSync-v0),
 EnvSpec(flashgames.Foosball2Player-v0),
 EnvSpec(gym-core.BattleZoneNoFrameskip-v0),
 EnvSpec(flashgames.SuperShinyheadHarderThanFlappyBird-v0),
 EnvSpec(gym-core.RoadRunnerNoFrameskip-v3),
 EnvSpec(gym-core.RoadRunnerNoFrameskip-v0),
 EnvSpec(gym-core.BreakoutSync-v0),
 EnvSpec(gym-core.BreakoutSync-v3),
 EnvSpec(flashgames.BullfrogJigsawPuzzle-v0),
 EnvSpec(flashgames.TheSilentPlanet-v0),
 EnvSpec(BerzerkDeterministic-v4),
 EnvSpec(flashgames.EuroKicks2016-v0),
 EnvSpec(wob.real.Quizlet-Solar-System-Learn-v0),
 EnvSpec(BerzerkDeterministic-v0),
 EnvSpec(AssaultNoFrameskip-v0),
 EnvSpec(flashgames.TouchTheSky-v0),
 EnvSpec(AssaultNoFrameskip-v4),
 EnvSpec(PhoenixNoFrameskip-v4),
 EnvSpec(gym-core.DoubleDunkSlow-v3),
 EnvSpec(PhoenixNoFrameskip-v0),
 EnvSpec(flashgames.IdleChop-v0),
 EnvSpec(gym-core.YarsRevenge-v3),
 EnvSpec(gym-core.AlienNoFrameskip-v0),
 EnvSpec(Humanoid-v1),
 EnvSpec(gym-core.AsteroidsSync-v3),
 EnvSpec(wob.mini.FindMidpoint-v0),
 EnvSpec(flashgames.DartsSim-v0),
 EnvSpec(flashgames.SmileyShowdown-v0),
 EnvSpec(flashgames.NeonRace2Lvl8-v0),
 EnvSpec(flashgames.SneakyScubaEscape-v0),
 EnvSpec(flashgames.SuperDash-v0),
 EnvSpec(flashgames.MummyMadness-v0),
 EnvSpec(gym-core.RobotankSlow-v0),
 EnvSpec(flashgames.HoldTheFort-v0),
 EnvSpec(KungFuMasterNoFrameskip-v0),
 EnvSpec(Frostbite-ramDeterministic-v4),
 EnvSpec(Frostbite-ramDeterministic-v0),
 EnvSpec(gym-core.VentureDeterministicSync-v3),
 EnvSpec(gym-core.VentureDeterministicSync-v0),
 EnvSpec(flashgames.FairyDefense-v0),
 EnvSpec(gym-core.RobotankSync-v0),
 EnvSpec(gym-core.RobotankSync-v3),
 EnvSpec(Qbert-ramNoFrameskip-v0),
 EnvSpec(Ant-v1),
 EnvSpec(Qbert-ramNoFrameskip-v4),
 EnvSpec(gym-core.Seaquest-v0),
 EnvSpec(gym-core.Seaquest-v3),
 EnvSpec(YarsRevenge-ram-v0),
 EnvSpec(YarsRevenge-ram-v4),
 EnvSpec(flashgames.IceBlock-v0),
 EnvSpec(FishingDerby-ram-v0),
 EnvSpec(Enduro-ramNoFrameskip-v4),
 EnvSpec(FrostbiteNoFrameskip-v0),
 EnvSpec(FishingDerby-ram-v4),
 EnvSpec(Enduro-ramNoFrameskip-v0),
 EnvSpec(FrostbiteNoFrameskip-v4),
 EnvSpec(gym-core.MsPacmanDeterministic-v3),
 EnvSpec(wob.mini.ClickColor-v0),
 EnvSpec(gym-core.MsPacmanDeterministic-v0),
 EnvSpec(flashgames.Dots-v0),
 EnvSpec(flashgames.NeonRace2Lvl6-v0),
 EnvSpec(gym-core.JamesbondDeterministicSlow-v3),
 EnvSpec(flashgames.NeonRace2Lvl12-v0),
 EnvSpec(gym-core.ElevatorActionDeterministicSync-v3),
 EnvSpec(gym-core.ElevatorActionDeterministicSync-v0),
 EnvSpec(flashgames.PixelPurge-v0),
 EnvSpec(flashgames.ReleaseTheMooks-v0),
 EnvSpec(gym-core.StarGunnerSlow-v3),
 EnvSpec(gym-core.StarGunnerSlow-v0),
 EnvSpec(flashgames.SapphireClix-v0),
 EnvSpec(gym-core.MontezumaRevengeDeterministicSync-v0),
 EnvSpec(flashgames.SlingBaby-v0),
 EnvSpec(gym-core.MontezumaRevengeDeterministicSync-v3),
 EnvSpec(gym-core.KungFuMasterDeterministicSlow-v3),
 EnvSpec(StarGunnerDeterministic-v0),
 EnvSpec(StarGunnerDeterministic-v4),
 EnvSpec(flashgames.SandcastleShowdown-v0),
 EnvSpec(flashgames.CharlieTheDuck-v0),
 EnvSpec(CNNClassifierTraining-v0),
 EnvSpec(wob.mini.EnterText-v0),
 EnvSpec(gym-core.BankHeistDeterministic-v3),
 EnvSpec(flashgames.CanyonValleyRally-v0),
 EnvSpec(gym-core.VentureDeterministic-v0),
 EnvSpec(Boxing-ram-v0),
 EnvSpec(Boxing-ram-v4),
 EnvSpec(gym-core.VentureDeterministic-v3),
 EnvSpec(flashgames.WackyStrike-v0),
 EnvSpec(flashgames.EasterEggSlider-v0),
 EnvSpec(flashgames.MindImpulse-v0),
 EnvSpec(flashgames.NeonRace2Lvl13-v0),
 EnvSpec(flashgames.FormulaRacer-v0),
 EnvSpec(flashgames.Hamsterball-v0),
 EnvSpec(gym-core.KungFuMasterDeterministicSync-v0),
 EnvSpec(Assault-ram-v0),
 EnvSpec(wob.mini.NumberCheckboxes-v0),
 EnvSpec(Assault-ram-v4),
 EnvSpec(gym-core.BreakoutDeterministic-v0),
 EnvSpec(gym-core.BreakoutDeterministic-v3),
 EnvSpec(flashgames.Offroaders2-v0),
 EnvSpec(gym-core.JamesbondDeterministicSync-v3),
 EnvSpec(gym-core.JamesbondDeterministicSync-v0),
 EnvSpec(flashgames.SnowQueen4-v0),
 EnvSpec(flashgames.ToyWarAngryRobotDog-v0),
 EnvSpec(flashgames.HighwayRevenge-v0),
 EnvSpec(flashgames.CarrotFantasyExtreme3-v0),
 EnvSpec(Solaris-ramDeterministic-v4),
 EnvSpec(ElevatorActionDeterministic-v4),
 EnvSpec(Solaris-ramDeterministic-v0),
 EnvSpec(ElevatorActionDeterministic-v0),
 EnvSpec(Solaris-v4),
 EnvSpec(Solaris-v0),
 EnvSpec(flashgames.BlockysEscape-v0),
 EnvSpec(gym-core.StarGunnerDeterministic-v3),
 EnvSpec(flashgames.HeroesOfMangaraTheFrostCrown-v0),
 EnvSpec(SolarisNoFrameskip-v0),
 EnvSpec(gym-core.PongDeterministic-v0),
 EnvSpec(SolarisNoFrameskip-v4),
 EnvSpec(RoadRunner-v4),
 EnvSpec(flashgames.ShortCircuit-v0),
 EnvSpec(RoadRunner-v0),
 EnvSpec(gym-core.AssaultDeterministicSlow-v0),
 EnvSpec(flashgames.SupercarDomination-v0),
 EnvSpec(gym-core.AssaultDeterministicSlow-v3),
 EnvSpec(flashgames.Xmatch2016-v0),
 EnvSpec(RoadRunner-ram-v0),
 EnvSpec(flashgames.CrystalCurse-v0),
 EnvSpec(RoadRunner-ram-v4),
 EnvSpec(gym-core.AsteroidsSlow-v3),
 EnvSpec(gym-core.AsteroidsSlow-v0),
 EnvSpec(Skiing-ramDeterministic-v0),
 EnvSpec(Skiing-ramDeterministic-v4),
 EnvSpec(flashgames.BlackRacerJigsawPuzzle-v0),
 EnvSpec(VideoPinball-ram-v0),
 EnvSpec(VideoPinball-ram-v4),
 EnvSpec(flashgames.DaymareInvaders-v0),
 EnvSpec(gym-core.GravitarNoFrameskip-v0),
 EnvSpec(FreewayNoFrameskip-v4),
 EnvSpec(WizardOfWorNoFrameskip-v0),
 EnvSpec(FreewayNoFrameskip-v0),
 EnvSpec(WizardOfWorNoFrameskip-v4),
 EnvSpec(flashgames.UnderwaterSecrets-v0),
 EnvSpec(gym-core.ElevatorAction30FPS-v0),
 EnvSpec(gym-core.ElevatorAction30FPS-v3),
 EnvSpec(flashgames.OkParking-v0),
 EnvSpec(flashgames.HeatRushFuture-v0),
 EnvSpec(gym-core.Venture-v0),
 EnvSpec(DoubleDunk-v4),
 EnvSpec(gym-core.MsPacmanDeterministicSync-v0),
 EnvSpec(gym-core.MsPacmanDeterministicSync-v3),
 EnvSpec(flashgames.WorldsGuard2-v0),
 EnvSpec(gym-core.BattleZoneDeterministicSync-v3),
 EnvSpec(gym-core.BattleZoneDeterministicSync-v0),
 EnvSpec(gym-core.VideoPinballNoFrameskip-v0),
 EnvSpec(gym-core.VideoPinballNoFrameskip-v3),
 EnvSpec(gym-core.JourneyEscapeNoFrameskip-v0),
 EnvSpec(HeroDeterministic-v4),
 EnvSpec(gym-core.JourneyEscapeNoFrameskip-v3),
 EnvSpec(HeroDeterministic-v0),
 EnvSpec(ZaxxonDeterministic-v0),
 EnvSpec(gym-core.PrivateEyeDeterministic-v0),
 EnvSpec(gym-core.PrivateEyeDeterministic-v3),
 EnvSpec(flashgames.Colorwars-v0),
 EnvSpec(gym-core.MsPacmanNoFrameskip-v0),
 EnvSpec(gym-core.JamesbondSlow-v0),
 EnvSpec(gym-core.JamesbondSlow-v3),
 EnvSpec(gym-core.MsPacmanNoFrameskip-v3),
 EnvSpec(gym-core.RiverraidSlow-v3),
 EnvSpec(gym-core.CartPole-v0),
 EnvSpec(flashgames.GalaxyMission-v0),
 EnvSpec(BeamRider-v4),
 EnvSpec(flashgames.SuperBoxotron2000-v0),
 EnvSpec(FishingDerby-ramDeterministic-v0),
 EnvSpec(FishingDerby-ramDeterministic-v4),
 EnvSpec(flashgames.Stratega-v0),
 EnvSpec(gym-core.AsteroidsNoFrameskip-v3),
 EnvSpec(CrazyClimber-v4),
 EnvSpec(flashgames.ParticleWarsExtreme-v0),
 EnvSpec(gym-core.AsteroidsNoFrameskip-v0),
 EnvSpec(CrazyClimber-v0),
 EnvSpec(flashgames.HexBattles-v0),
 EnvSpec(gym-core.BoxingNoFrameskip-v3),
 EnvSpec(Pitfall-ramDeterministic-v0),
 EnvSpec(flashgames.BumbleTumble-v0),
 EnvSpec(gym-core.BoxingNoFrameskip-v0),
 EnvSpec(Pitfall-ramDeterministic-v4),
 EnvSpec(wob.real.Quizlet-Geography-Test-v0),
 EnvSpec(gym-core.SkiingSync-v0),
 EnvSpec(gym-core.SkiingSync-v3),
 EnvSpec(GravitarDeterministic-v4),
 EnvSpec(flashgames.CoasterRacerLvl5-v0),
 EnvSpec(GravitarDeterministic-v0),
 EnvSpec(gym-core.BowlingNoFrameskip-v3),
 EnvSpec(gym-core.BowlingNoFrameskip-v0),
 EnvSpec(flashgames.EvilSun-v0),
 EnvSpec(flashgames.HalloweenJam-v0),
 EnvSpec(flashgames.LlamasInDistress-v0),
 EnvSpec(gym-core.PooyanDeterministic-v0),
 EnvSpec(flashgames.WreckRoad-v0),
 EnvSpec(wob.real.Signup-6-v0),
 EnvSpec(flashgames.EvolutionRacingLvl9-v0),
 EnvSpec(gym-core.ElevatorActionSlow-v0),
 EnvSpec(gym-core.ElevatorActionSlow-v3),
 EnvSpec(wob.real.Signup-7-v0),
 EnvSpec(flashgames.SistersOfNoMercy-v0),
 EnvSpec(Tennis-v0),
 EnvSpec(flashgames.CoasterRacerLvl4-v0),
 EnvSpec(flashgames.EvolutionRacingLvl4-v0),
 EnvSpec(gym-core.RoadRunner-v0),
 EnvSpec(gym-core.RoadRunner-v3),
 EnvSpec(Carnival-ramDeterministic-v4),
 EnvSpec(gym-core.Carnival-v0),
 EnvSpec(flashgames.CursedTreasureDontTouchMyGems-v0),
 EnvSpec(gym-core.Carnival-v3),
 EnvSpec(flashgames.FlappyBat-v0),
 EnvSpec(wob.mini.ResizeTextarea-v0),
 EnvSpec(gym-core.SpaceInvadersNoFrameskip-v3),
 EnvSpec(TennisNoFrameskip-v0),
 EnvSpec(gym-core.PhoenixNoFrameskip-v3),
 EnvSpec(gym-core.PhoenixNoFrameskip-v0),
 EnvSpec(gym-core.Alien30FPS-v3),
 EnvSpec(gym-core.Alien30FPS-v0),
 EnvSpec(flashgames.TheOneForkRestaurantDx-v0),
 EnvSpec(DemonAttack-ram-v0),
 EnvSpec(DemonAttack-ram-v4),
 EnvSpec(gym-core.KungFuMasterNoFrameskip-v0),
 EnvSpec(gym-core.KungFuMasterNoFrameskip-v3),
 EnvSpec(flashgames.SpacePunkRacer-v0),
 EnvSpec(TennisNoFrameskip-v4),
 EnvSpec(flashgames.FishAndDestroy-v0),
 EnvSpec(flashgames.KartRacing-v0),
 EnvSpec(flashgames.JellySnake-v0),
 EnvSpec(gym-core.AsterixSlow-v0),
 EnvSpec(gym-core.AsterixSlow-v3),
 EnvSpec(flashgames.FishEatFish-v0),
 EnvSpec(flashgames.Jumprunner-v0),
 EnvSpec(flashgames.HoleInOne-v0),
 EnvSpec(flashgames.AwesomeRun2-v0),
 EnvSpec(gym-core.BattleZoneSlow-v0),
 EnvSpec(gym-core.BattleZoneSlow-v3),
 EnvSpec(flashgames.IntoSpace-v0),
 EnvSpec(flashgames.CarsVsRobots-v0),
 EnvSpec(flashgames.BubbleHitPonyParade-v0),
 EnvSpec(flashgames.AWeekendAtTweetys-v0),
 EnvSpec(gym-core.DoubleDunkDeterministicSync-v0),
 EnvSpec(gym-core.DoubleDunkDeterministicSync-v3),
 EnvSpec(Amidar-v4),
 EnvSpec(flashgames.ModelCarRacing-v0),
 EnvSpec(gym-core.ChopperCommandDeterministicSync-v3),
 EnvSpec(gym-core.ChopperCommandDeterministicSync-v0),
 EnvSpec(flashgames.PirateRunAway-v0),
 EnvSpec(flashgames.MonsterLabFeedThemAll-v0),
 EnvSpec(Gravitar-ramDeterministic-v0),
 EnvSpec(gym-core.DemonAttack30FPS-v0),
 EnvSpec(flashgames.ViewtifulFightClub2-v0),
 EnvSpec(Gravitar-ramDeterministic-v4),
 EnvSpec(BattleZone-ram-v4),
 EnvSpec(BattleZone-ram-v0),
 EnvSpec(IceHockey-ram-v0),
 EnvSpec(IceHockey-ram-v4),
 EnvSpec(flashgames.DragonChronicles-v0),
 EnvSpec(gym-core.PitfallDeterministic-v0),
 EnvSpec(gym-core.PitfallDeterministic-v3),
 EnvSpec(flashgames.SnowPrincessMakeup-v0),
 EnvSpec(flashgames.QubeyTheCube-v0),
 EnvSpec(gym-core.Alien-v3),
 EnvSpec(gym-core.Alien-v0),
 EnvSpec(gym-core.AtlantisSync-v0),
 EnvSpec(gym-core.AtlantisSync-v3),
 EnvSpec(flashgames.TowerMoon-v0),
 EnvSpec(flashgames.MotherLoad-v0),
 EnvSpec(wob.mini.EnterTextDynamic-v0),
 EnvSpec(flashgames.BubbleGlee-v0),
 EnvSpec(flashgames.FirefighterCannon-v0),
 EnvSpec(gym-core.BerzerkDeterministicSlow-v0),
 EnvSpec(gym-core.BerzerkDeterministicSlow-v3),
 EnvSpec(DoubleDunk-ramNoFrameskip-v0),
 EnvSpec(gym-core.IceHockeySlow-v3),
 EnvSpec(gym-core.IceHockeySlow-v0),
 EnvSpec(AssaultDeterministic-v0),
 EnvSpec(DoubleDunk-ramNoFrameskip-v4),
 EnvSpec(MsPacman-v0),
 EnvSpec(flashgames.Offroaders-v0),
 EnvSpec(MsPacman-v4),
 EnvSpec(flashgames.HiredHeroes-v0),
 EnvSpec(flashgames.WolfSpiderJigsawPuzzle-v0),
 EnvSpec(AssaultDeterministic-v4),
 EnvSpec(gym-core.FreewayDeterministic-v3),
 EnvSpec(flashgames.ToonEscapeMaze-v0),
 EnvSpec(wob.mini.ClickCheckboxes-v0),
 EnvSpec(gym-core.FreewayDeterministic-v0),
 EnvSpec(Seaquest-ramNoFrameskip-v0),
 EnvSpec(Seaquest-ramNoFrameskip-v4),
 EnvSpec(Blackjack-v0),
 EnvSpec(TennisDeterministic-v0),
 EnvSpec(TennisDeterministic-v4),
 EnvSpec(Atlantis-v4),
 EnvSpec(Atlantis-v0),
 EnvSpec(UpNDownDeterministic-v0),
 EnvSpec(flashgames.WarBerlinIdle-v0),
 EnvSpec(gym-core.BattleZoneSync-v3),
 EnvSpec(gym-core.Centipede30FPS-v0),
 EnvSpec(gym-core.Centipede30FPS-v3),
 EnvSpec(gym-core.BattleZoneSync-v0),
 EnvSpec(gym-core.FishingDerby-v0),
 EnvSpec(Asteroids-v0),
 EnvSpec(Asteroids-v4),
 EnvSpec(gym-core.SpaceInvaders30FPS-v0),
 EnvSpec(UpNDownDeterministic-v4),
 EnvSpec(IceHockeyDeterministic-v4),
 EnvSpec(flashgames.SpectrumHeist-v0),
 EnvSpec(gym-core.BattleZoneNoFrameskip-v3),
 EnvSpec(IceHockeyDeterministic-v0),
 EnvSpec(gym-core.EnduroNoFrameskip-v3),
 EnvSpec(gym-core.EnduroNoFrameskip-v0),
 EnvSpec(flashgames.TankStorm3-v0),
 EnvSpec(gym-core.BeamRiderDeterministic-v0),
 EnvSpec(gym-core.BeamRiderDeterministic-v3),
 EnvSpec(flashgames.Infinitix-v0),
 EnvSpec(flashgames.PoliceInterceptor-v0),
 EnvSpec(gym-core.CrazyClimber-v3),
 EnvSpec(wob.real.ClickButton-Airfrance-v0),
 EnvSpec(gym-core.CrazyClimber-v0),
 EnvSpec(flashgames.Autoattack-v0),
 EnvSpec(flashgames.CircuitSuperCarsRacing-v0),
 EnvSpec(flashgames.HeatRushUsaLvl8-v0),
 EnvSpec(flashgames.Blix-v0),
 EnvSpec(WizardOfWorDeterministic-v4),
 EnvSpec(gym-core.RoadRunnerDeterministic-v0),
 EnvSpec(gym-core.RoadRunnerDeterministic-v3),
 EnvSpec(WizardOfWorDeterministic-v0),
 EnvSpec(gym-core.IceHockeyNoFrameskip-v3),
 EnvSpec(gym-core.IceHockeyNoFrameskip-v0),
 EnvSpec(flashgames.ElClassico-v0),
 EnvSpec(DoubleDunk-v0),
 EnvSpec(LunarLander-v2),
 EnvSpec(MsPacman-ramDeterministic-v4),
 EnvSpec(flashgames.Neopods-v0),
 EnvSpec(MsPacman-ramDeterministic-v0),
 EnvSpec(flashgames.TutiFruti-v0),
 EnvSpec(flashgames.WhatsInsideTheBox-v0),
 EnvSpec(BoxingDeterministic-v0),
 EnvSpec(BoxingDeterministic-v4),
 EnvSpec(gym-core.PooyanDeterministicSlow-v3),
 EnvSpec(gym-core.PooyanDeterministicSlow-v0),
 EnvSpec(flashgames.3dMuscleCarRacer-v0),
 EnvSpec(flashgames.ColorZapper-v0),
 EnvSpec(Robotank-v4),
 EnvSpec(Robotank-v0),
 EnvSpec(flashgames.HeroSimulator-v0),
 EnvSpec(wob.mini.ClickButton-v0),
 EnvSpec(wob.mini.SimpleArithmetic-v0),
 EnvSpec(flashgames.FormulaRacer2012Lvl11-v0),
 EnvSpec(flashgames.SuperRallyChallenge2-v0),
 EnvSpec(flashgames.AmericanRacing2-v0),
 EnvSpec(gym-core.SolarisSync-v0),
 EnvSpec(gym-core.SolarisSync-v3),
 EnvSpec(Gravitar-ram-v0),
 EnvSpec(Frostbite-v4),
 EnvSpec(Gravitar-ram-v4),
 EnvSpec(Frostbite-v0),
 EnvSpec(Acrobot-v1),
 EnvSpec(gym-core.FrostbiteDeterministic-v3),
 EnvSpec(gym-core.FrostbiteDeterministic-v0),
 EnvSpec(wob.mini.ClickDialog-v0),
 EnvSpec(flashgames.Wheelers-v0),
 EnvSpec(starcraft.TerranAstralBalance-v0),
 EnvSpec(ZaxxonNoFrameskip-v0),
 EnvSpec(HeroNoFrameskip-v4),
 EnvSpec(flashgames.BubbleSlasher-v0),
 EnvSpec(ZaxxonNoFrameskip-v4),
 EnvSpec(flashgames.AmericanRacingLvl3-v0),
 EnvSpec(HeroNoFrameskip-v0),
 EnvSpec(NameThisGame-v0),
 EnvSpec(flashgames.FormulaRacer2012Lvl10-v0),
 EnvSpec(flashgames.CoverOrangeJourneyGangsters-v0),
 EnvSpec(NameThisGame-v4),
 EnvSpec(flashgames.PaintWars-v0),
 EnvSpec(gym-core.StarGunner-v0),
 EnvSpec(flashgames.AchilliaTheGame-v0),
 EnvSpec(gym-core.StarGunner-v3),
 EnvSpec(flashgames.KnightsOfRock-v0),
 EnvSpec(gym-core.Jamesbond30FPS-v0),
 EnvSpec(gym-core.Jamesbond30FPS-v3),
 EnvSpec(flashgames.GalacticCats-v0),
 EnvSpec(Krull-ram-v0),
 EnvSpec(Krull-ram-v4),
 EnvSpec(flashgames.DeathDiceOverdose-v0),
 EnvSpec(flashgames.BubbleRubble-v0),
 EnvSpec(BowlingNoFrameskip-v0),
 EnvSpec(flashgames.ImitationNationSnakeGame-v0),
 EnvSpec(gym-core.CentipedeSync-v3),
 EnvSpec(BowlingNoFrameskip-v4),
 EnvSpec(gym-core.CentipedeSync-v0),
 EnvSpec(flashgames.IceRun-v0),
 EnvSpec(flashgames.Madburger3-v0),
 EnvSpec(gym-core.MontezumaRevengeDeterministic-v3),
 EnvSpec(flashgames.GsSoccerWorldCup-v0),
 EnvSpec(gym-core.MontezumaRevengeDeterministic-v0),
 EnvSpec(flashgames.NeonRaceLvl3-v0),
 EnvSpec(Assault-ramNoFrameskip-v0),
 EnvSpec(Assault-ramNoFrameskip-v4),
 EnvSpec(BankHeist-ram-v0),
 EnvSpec(flashgames.HungryPiranha-v0),
 EnvSpec(BankHeist-ram-v4),
 EnvSpec(flashgames.DoodleGod2Walkthrough-v0),
 EnvSpec(SpaceInvaders-ram-v0),
 EnvSpec(SpaceInvaders-ram-v4),
 EnvSpec(FishingDerby-v4),
 EnvSpec(FishingDerby-v0),
 EnvSpec(flashgames.TheBoomlandsWorldWars-v0),
 EnvSpec(gym-core.GravitarDeterministicSync-v0),
 EnvSpec(Freeway-ramDeterministic-v4),
 EnvSpec(gym-core.GravitarDeterministicSync-v3),
 EnvSpec(DoubleDunkNoFrameskip-v0),
 EnvSpec(flashgames.Helixteus-v0),
 EnvSpec(DoubleDunkNoFrameskip-v4),
 EnvSpec(flashgames.SuperPuzzlePlatformer-v0),
 EnvSpec(flashgames.MatchAndCrash-v0),
 EnvSpec(AsterixNoFrameskip-v0),
 EnvSpec(AsterixNoFrameskip-v4),
 EnvSpec(flashgames.MatchCraft-v0),
 EnvSpec(SpaceInvaders-ramDeterministic-v4),
 EnvSpec(SpaceInvaders-ramDeterministic-v0),
 EnvSpec(JourneyEscapeDeterministic-v0),
 EnvSpec(flashgames.HandsOff-v0),
 EnvSpec(flashgames.Zevil2-v0),
 EnvSpec(JourneyEscapeDeterministic-v4),
 EnvSpec(flashgames.Paintwars-v0),
 EnvSpec(gym-core.PooyanSync-v0),
 EnvSpec(gym-core.PooyanSync-v3),
 EnvSpec(flashgames.MasterDifference-v0),
 EnvSpec(BeamRider-ram-v0),
 EnvSpec(BeamRider-ram-v4),
 EnvSpec(gym-core.ChopperCommandSlow-v0),
 EnvSpec(gym-core.ChopperCommandSlow-v3),
 EnvSpec(gym-core.Bowling30FPS-v0),
 EnvSpec(gym-core.Bowling30FPS-v3),
 EnvSpec(flashgames.Overheat-v0),
 EnvSpec(flashgames.GravityThruster-v0),
 EnvSpec(flashgames.NeonRaceLvl2-v0),
 EnvSpec(gtav.Speed-v0),
 EnvSpec(flashgames.TowerEmpire-v0),
 EnvSpec(Freeway-ramNoFrameskip-v4),
 EnvSpec(Freeway-ramNoFrameskip-v0),
 EnvSpec(PitfallDeterministic-v4),
 EnvSpec(flashgames.FormulaRacerLvl6-v0),
 EnvSpec(flashgames.HappyBallz-v0),
 EnvSpec(PitfallDeterministic-v0),
 EnvSpec(flashgames.Flagman-v0),
 EnvSpec(flashgames.PiggysCupcakeQuest-v0),
 EnvSpec(gym-core.KungFuMaster-v3),
 EnvSpec(gym-core.KungFuMaster-v0),
 EnvSpec(flashgames.AmericanRacingLvl21-v0),
 EnvSpec(flashgames.CoasterRacer-v0),
 EnvSpec(gym-core.PooyanNoFrameskip-v0),
 EnvSpec(flashgames.JungleCrash-v0),
 EnvSpec(gym-core.PooyanNoFrameskip-v3),
 EnvSpec(gym-core.RiverraidSync-v0),
 EnvSpec(gym-core.RiverraidSync-v3),
 EnvSpec(flashgames.ToyRacers-v0),
 EnvSpec(gym-core.JamesbondNoFrameskip-v0),
 EnvSpec(flashgames.DrinkBeerNeglectFamily-v0),
 EnvSpec(gym-core.ZaxxonDeterministic-v0),
 EnvSpec(gym-core.ZaxxonDeterministic-v3),
 EnvSpec(gym-core.Seaquest30FPS-v3),
 EnvSpec(gym-core.Seaquest30FPS-v0),
 EnvSpec(flashgames.FormulaRacer2012Lvl6-v0),
 EnvSpec(flashgames.TheCubicMonkeyAdventures2-v0),
 EnvSpec(flashgames.PlaneRace2-v0),
 EnvSpec(flashgames.Cruisin-v0),
 EnvSpec(DuplicatedInput-v0),
 EnvSpec(flashgames.WarOfTheShard-v0),
 EnvSpec(flashgames.UnfreezeMe3-v0),
 EnvSpec(flashgames.AnotherLife2-v0),
 EnvSpec(flashgames.HeatRushUsaLvl2-v0),
 EnvSpec(wob.mini.EnterTime-v0),
 EnvSpec(DemonAttack-ramNoFrameskip-v0),
 EnvSpec(gym-core.StarGunnerNoFrameskip-v3),
 EnvSpec(DemonAttack-ramNoFrameskip-v4),
 EnvSpec(flashgames.ExperimentalShooter2-v0),
 EnvSpec(flashgames.DodgeAndCrash-v0),
 EnvSpec(flashgames.BoxRacers-v0),
 EnvSpec(flashgames.IndependenceDaySlacking2015-v0),
 EnvSpec(flashgames.UltimateLegend-v0),
 EnvSpec(gym-core.CarnivalSlow-v0),
 EnvSpec(gym-core.CarnivalSlow-v3),
 EnvSpec(flashgames.RainbowDrops-v0),
 EnvSpec(gym-core.KungFuMasterDeterministic-v3),
 EnvSpec(gym-core.FreewaySync-v0),
 EnvSpec(gym-core.FreewaySync-v3),
 EnvSpec(gym-core.KungFuMasterDeterministic-v0),
 EnvSpec(flashgames.HyperTravel-v0),
 EnvSpec(flashgames.RiseOfChampions-v0),
 EnvSpec(flashgames.NadiasRage-v0),
 EnvSpec(flashgames.TrickyRick-v0),
 EnvSpec(flashgames.FredFigglehorn-v0),
 EnvSpec(flashgames.FormulaRacer2012Lvl7-v0),
 EnvSpec(flashgames.FlyingKiwi-v0),
 EnvSpec(flashgames.PickUpTruckRacing-v0),
 EnvSpec(flashgames.RhythmBlasterV2-v0),
 EnvSpec(flashgames.HeavyLegion2-v0),
 EnvSpec(gym-core.PrivateEyeDeterministicSlow-v0),
 EnvSpec(gym-core.AlienDeterministicSlow-v3),
 EnvSpec(gym-core.AlienDeterministicSlow-v0),
 EnvSpec(gym-core.PrivateEyeDeterministicSlow-v3),
 EnvSpec(gym-core.BattleZoneDeterministicSlow-v0),
 EnvSpec(flashgames.MushyMishy-v0),
 EnvSpec(flashgames.BombIt4-v0),
 EnvSpec(flashgames.MonkeyGems-v0),
 EnvSpec(gym-core.YarsRevengeDeterministicSync-v0),
 EnvSpec(flashgames.TechnoMania-v0),
 EnvSpec(gym-core.YarsRevengeDeterministicSync-v3),
 EnvSpec(flashgames.Mushbooms-v0),
 EnvSpec(RiverraidDeterministic-v4),
 EnvSpec(gym-core.BattleZoneDeterministicSlow-v3),
 EnvSpec(RiverraidDeterministic-v0),
 EnvSpec(flashgames.TumbleTiles-v0),
 EnvSpec(wob.mini.ChooseList-v0),
 EnvSpec(flashgames.Basement-v0),
 EnvSpec(flashgames.BikeTrial4-v0),
 EnvSpec(gym-core.CarnivalDeterministicSlow-v0),
 EnvSpec(gym-core.CarnivalDeterministicSlow-v3),
 EnvSpec(gym-core.CrazyClimberNoFrameskip-v3),
 EnvSpec(gym-core.PrivateEyeDeterministicSync-v3),
 EnvSpec(gym-core.PrivateEyeDeterministicSync-v0),
 EnvSpec(flashgames.HeavenAndHell-v0),
 EnvSpec(flashgames.JamesTheSpaceZebra-v0),
 EnvSpec(flashgames.GoGreenGo-v0),
 EnvSpec(KellyCoinflipGeneralized-v0),
 EnvSpec(gym-core.UpNDown-v0),
 EnvSpec(flashgames.CaptainNutty-v0),
 EnvSpec(flashgames.TheProfessionals3-v0),
 EnvSpec(gym-core.SolarisNoFrameskip-v0),
 EnvSpec(gym-core.SolarisNoFrameskip-v3),
 EnvSpec(flashgames.PickAndDig2-v0),
 EnvSpec(flashgames.EasterBunnyCollectCarrots-v0),
 EnvSpec(Gopher-ram-v0),
 EnvSpec(flashgames.Tosuta-v0),
 EnvSpec(Gopher-ram-v4),
 EnvSpec(flashgames.StickyNinjaMissions-v0),
 EnvSpec(CarRacing-v0),
 EnvSpec(flashgames.DiscoverEurope-v0),
 EnvSpec(gym-core.VideoPinballDeterministicSync-v3),
 EnvSpec(gym-core.BerzerkSync-v3),
 EnvSpec(flashgames.4x4Monster3-v0),
 EnvSpec(wob.mini.CopyPaste2-v0),
 EnvSpec(gym-core.BerzerkSync-v0),
 EnvSpec(AmidarNoFrameskip-v0),
 EnvSpec(flashgames.BubbleTanksTd15-v0),
 EnvSpec(AmidarNoFrameskip-v4),
 EnvSpec(gym-core.CrazyClimberDeterministic-v3),
 EnvSpec(flashgames.LuxUltimate-v0),
 EnvSpec(flashgames.ZodiacMatch-v0),
 EnvSpec(gym-core.Riverraid30FPS-v0),
 EnvSpec(gym-core.AirRaidDeterministicSlow-v0),
 EnvSpec(gym-core.AirRaidDeterministicSlow-v3),
 EnvSpec(flashgames.Cloud9-v0),
 EnvSpec(gym-core.WizardOfWorDeterministic-v3),
 EnvSpec(gym-core.WizardOfWorDeterministic-v0),
 EnvSpec(Asteroids-ram-v0),
 EnvSpec(gym-core.MsPacmanSlow-v0),
 EnvSpec(Asteroids-ram-v4),
 EnvSpec(gym-core.ZaxxonNoFrameskip-v3),
 EnvSpec(gym-core.Enduro-v3),
 EnvSpec(gym-core.Enduro-v0),
 EnvSpec(gym-core.ZaxxonNoFrameskip-v0),
 EnvSpec(flashgames.NewSplitterPals-v0),
 EnvSpec(flashgames.21Balloons-v0),
 EnvSpec(flashgames.Match3Adventure-v0),
 EnvSpec(flashgames.DragonVsMonster-v0),
 EnvSpec(flashgames.HeatRushFutureLvl5-v0),
 EnvSpec(Solaris-ram-v0),
 EnvSpec(Solaris-ram-v4),
 EnvSpec(flashgames.EpicDefender-v0),
 EnvSpec(flashgames.DragonChain-v0),
 EnvSpec(flashgames.FootballHeads201314Ligue1-v0),
 EnvSpec(flashgames.LonelyEscapeAsylum-v0),
 EnvSpec(MontezumaRevengeDeterministic-v0),
 EnvSpec(flashgames.MonkeyBlast-v0),
 EnvSpec(wob.real.BookFlight-Delta-v0),
 EnvSpec(ReversedAddition-v0),
 EnvSpec(gym-core.YarsRevengeSlow-v3),
 EnvSpec(gym-core.YarsRevengeSlow-v0),
 EnvSpec(wob.mini.EmailInbox-v0),
 EnvSpec(flashgames.VectorRunner-v0),
 EnvSpec(RiverraidNoFrameskip-v4),
 EnvSpec(RiverraidNoFrameskip-v0),
 EnvSpec(flashgames.AmericanRacingLvl13-v0),
 EnvSpec(BipedalWalkerHardcore-v2),
 EnvSpec(flashgames.AmericanRacingLvl22-v0),
 EnvSpec(gym-core.PrivateEye-v0),
 EnvSpec(gym-core.PrivateEye-v3),
 EnvSpec(flashgames.CoasterCars2Megacross-v0),
 EnvSpec(flashgames.SkyIsland-v0),
 EnvSpec(flashgames.SuperbikeExtreme-v0),
 EnvSpec(gym-core.KangarooSync-v3),
 EnvSpec(flashgames.SpacePunkRacerLvl5-v0),
 EnvSpec(flashgames.MarshmallowsEscape-v0),
 EnvSpec(gym-core.KangarooSync-v0),
 EnvSpec(gym-core.AmidarDeterministicSync-v3),
 EnvSpec(gym-core.AmidarDeterministicSync-v0),
 EnvSpec(Reacher-v1),
 EnvSpec(gym-core.BankHeistNoFrameskip-v0),
 EnvSpec(gym-core.BankHeistNoFrameskip-v3),
 EnvSpec(flashgames.FinalSiege-v0),
 EnvSpec(wob.real.Quizlet-Planet-Learn-v0),
 EnvSpec(flashgames.ClimbingSanta-v0),
 EnvSpec(gym-core.ChopperCommand-v0),
 EnvSpec(Tutankham-ramDeterministic-v4),
 EnvSpec(gym-core.ChopperCommand-v3),
 EnvSpec(Tutankham-ramDeterministic-v0),
 EnvSpec(flashgames.HeatRushFutureLvl4-v0),
 EnvSpec(gym-core.FrostbiteSlow-v0),
 EnvSpec(flashgames.CowboyVsUfos-v0),
 EnvSpec(gym-core.FrostbiteSlow-v3),
 EnvSpec(ElevatorAction-ramNoFrameskip-v0),
 EnvSpec(ElevatorAction-ramNoFrameskip-v4),
 EnvSpec(flashgames.CoasterCars2Contact-v0),
 EnvSpec(flashgames.JungleEagle-v0),
 EnvSpec(flashgames.CoasterRacerLvl2-v0),
 EnvSpec(flashgames.HighSpeedChase-v0),
 EnvSpec(gym-core.FreewayDeterministicSync-v0),
 EnvSpec(flashgames.HungerHunter-v0),
 EnvSpec(gym-core.FreewayDeterministicSync-v3),
 EnvSpec(gym-core.StarGunnerDeterministicSync-v3),
 EnvSpec(Bowling-v0),
 EnvSpec(PrivateEye-ramNoFrameskip-v0),
 EnvSpec(flashgames.RunRunRan-v0),
 EnvSpec(Bowling-v4),
 EnvSpec(PrivateEye-ramNoFrameskip-v4),
 EnvSpec(PitfallNoFrameskip-v0),
 EnvSpec(wob.real.Signup-12-v0),
 EnvSpec(flashgames.DriftRunners2-v0),
 EnvSpec(PitfallNoFrameskip-v4),
 EnvSpec(wob.real.Signup-4-v0),
 EnvSpec(CentipedeNoFrameskip-v0),
 EnvSpec(flashgames.MedievalShark-v0),
 EnvSpec(CentipedeNoFrameskip-v4),
 EnvSpec(gym-core.MontezumaRevengeDeterministicSlow-v3),
 EnvSpec(gym-core.MontezumaRevengeDeterministicSlow-v0),
 EnvSpec(gym-core.WizardOfWorDeterministicSync-v0),
 EnvSpec(gym-core.WizardOfWorDeterministicSync-v3),
 EnvSpec(flashgames.DinoBubble-v0),
 EnvSpec(flashgames.BubbleMover-v0),
 EnvSpec(flashgames.CoasterRacerLvl3-v0),
 EnvSpec(Riverraid-ramNoFrameskip-v4),
 EnvSpec(gym-core.DemonAttackDeterministic-v3),
 EnvSpec(Riverraid-ramNoFrameskip-v0),
 EnvSpec(wob.MiniWorldOfBits-v0),
 EnvSpec(wob.real.Signup-2-v0),
 EnvSpec(Alien-ramNoFrameskip-v4),
 EnvSpec(PrivateEye-v0),
 EnvSpec(flashgames.RedBeard-v0),
 EnvSpec(FrozenLake8x8-v0),
 EnvSpec(Alien-ramNoFrameskip-v0),
 EnvSpec(PrivateEye-v4),
 EnvSpec(gym-core.CartPoleLowDSync-v0),
 EnvSpec(flashgames.DnaLabRush-v0),
 EnvSpec(gym-core.Berzerk30FPS-v0),
 EnvSpec(gym-core.Berzerk30FPS-v3),
 EnvSpec(LunarLanderContinuous-v2),
 EnvSpec(flashgames.HeatRushFutureLvl12-v0),
 EnvSpec(Asterix-ram-v0),
 EnvSpec(flashgames.NinjaPainter-v0),
 EnvSpec(Asterix-ram-v4),
 EnvSpec(flashgames.GSwitch-v0),
 EnvSpec(flashgames.30Seconds-v0),
 EnvSpec(wob.real.Quizlet-Geography-Learn-v0),
 EnvSpec(wob.real.Quizlet-Comet-Test-v0),
 EnvSpec(QbertDeterministic-v0),
 EnvSpec(QbertDeterministic-v4),
 EnvSpec(flashgames.SpinSprint-v0),
 EnvSpec(flashgames.SmileyPuzzle-v0),
 EnvSpec(SolarisDeterministic-v0),
 EnvSpec(gym-core.Riverraid-v0),
 EnvSpec(gym-core.TennisNoFrameskip-v3),
 EnvSpec(SolarisDeterministic-v4),
 EnvSpec(gym-core.TennisNoFrameskip-v0),
 EnvSpec(flashgames.DriftRunners3d-v0),
 EnvSpec(JamesbondNoFrameskip-v4),
 EnvSpec(flashgames.BugsGotGuns-v0),
 EnvSpec(TwoRoundDeterministicReward-v0),
 EnvSpec(flashgames.NeonRace2-v0),
 EnvSpec(gym-core.StarGunnerDeterministicSlow-v0),
 EnvSpec(flashgames.BunnyCannon-v0),
 EnvSpec(gym-core.Frostbite-v0),
 EnvSpec(gym-core.Frostbite-v3),
 EnvSpec(PhoenixDeterministic-v4),
 EnvSpec(flashgames.Mrbirdie-v0),
 EnvSpec(PhoenixDeterministic-v0),
 EnvSpec(StarGunner-ramDeterministic-v4),
 EnvSpec(PredictActionsCartpole-v0),
 EnvSpec(flashgames.FlashDrive-v0),
 EnvSpec(StarGunner-ramDeterministic-v0),
 EnvSpec(BeamRider-ramDeterministic-v4),
 EnvSpec(BeamRider-ramDeterministic-v0),
 EnvSpec(flashgames.SmashTheSwine-v0),
 EnvSpec(gym-core.SpaceInvadersDeterministic-v3),
 EnvSpec(gym-core.SpaceInvadersDeterministic-v0),
 EnvSpec(flashgames.CemeteryRoad-v0),
 EnvSpec(flashgames.EvolutionRacingLvl5-v0),
 EnvSpec(wob.mini.ClickTab2-v0),
 EnvSpec(gym-core.BoxingDeterministicSync-v3),
 EnvSpec(flashgames.JollySwipe-v0),
 EnvSpec(gym-core.KangarooDeterministicSlow-v3),
 EnvSpec(NameThisGame-ramNoFrameskip-v4),
 EnvSpec(wob.mini.UseColorwheel-v0),
 EnvSpec(PongDeterministic-v4),
 EnvSpec(Pong-ramNoFrameskip-v0),
 EnvSpec(Pong-ramDeterministic-v0),
 EnvSpec(flashgames.Devilment-v0),
 EnvSpec(Pong-ramDeterministic-v4),
 EnvSpec(wob.mini.HighlightText2-v0),
 EnvSpec(gym-core.BankHeist30FPS-v3),
 EnvSpec(gym-core.BankHeist30FPS-v0),
 EnvSpec(gym-core.NameThisGameNoFrameskip-v3),
 EnvSpec(NameThisGame-ramNoFrameskip-v0),
 EnvSpec(gym-core.NameThisGameNoFrameskip-v0),
 EnvSpec(gym-core.PongSlow-v0),
 EnvSpec(gym-core.PongSlow-v3),
 EnvSpec(flashgames.ProjectMonochrome-v0),
 EnvSpec(gym-core.PooyanSlow-v3),
 EnvSpec(TutankhamNoFrameskip-v4),
 EnvSpec(flashgames.VirtualRacer-v0),
 EnvSpec(TutankhamNoFrameskip-v0),
 EnvSpec(flashgames.DotGrowth-v0),
 EnvSpec(flashgames.VanguardWars-v0),
 EnvSpec(gym-core.UpNDownSlow-v0),
 EnvSpec(wob.mini.EnterPassword-v0),
 EnvSpec(flashgames.MiniMachines-v0),
 EnvSpec(wob.mini.Terminal-v0),
 EnvSpec(gym-core.JourneyEscapeDeterministicSlow-v3),
 EnvSpec(gym-core.JourneyEscapeDeterministicSlow-v0),
 EnvSpec(flashgames.EvolutionRacingLvl16-v0),
 ...]

Function to load arguments into play


In [18]:
def loadarguments():
    global env_conf
    global env
    global setup_json
    global shared_model
    global saved_state
    global optimizer
    global torch
    
    
    undo_logger_setup()

    torch.set_default_tensor_type('torch.FloatTensor')
    torch.manual_seed(args['seed'])
    
    setup_json = read_config(args['EC'])

    env_conf = setup_json[args['config']]

    for i in setup_json.keys():
        if i in args['ENV']:
            env_conf = setup_json[i]
    env = atari_env(args['ENV'], env_conf)

    shared_model = A3Clstm(env.observation_space.shape[0], env.action_space)
    if args['L']:
        saved_state = torch.load(
            '{0}{1}.dat'.format(args['LMD'], args['ENV']))
        shared_model.load_state_dict(saved_state)
    shared_model.share_memory()



    if args['SO']:
        if args['OPT'] == 'RMSprop':
            optimizer = SharedRMSprop(shared_model.parameters(), lr=args['LR'])
        if args['OPT'] == 'Adam':
            optimizer = SharedAdam(shared_model.parameters(), lr=args['LR'])
        if args['OPT'] == 'LrSchedAdam':
            optimizer = SharedLrSchedAdam(
                shared_model.parameters(), lr=args['LR'])
        optimizer.share_memory()
    else:
        optimizer = None

Input Desription

Parameter: LR
Type: float
Description: Learning Rate
Parameter: G
Type=float,
Description: discount factor for rewards (default: 0.99)
Parameter: T
Type=float,
Description: parameter for GAE (default: 1.00)
Parameter:seed
Type: int
Descrition: random seed (default: 42)
Parameter:W
Type=int,
Description: how many training processes to use (default: 5)
Parameter: NS
Type=int,
Description: number of forward steps in A3C (default: 20)
Parameter: M
Type=int,
Description: maximum length of an episode (default: 10000)
Parameter: ENV
Description: environment to train on (default: Pong-v0)
Parameter: EC
Description: environment to crop and resize info (default: settings.json)
Parameter: SO
Description: use an optimizer without shared statistics.(default: True)
Parameter: L
Description: load a trained model, (default: False)
Parameter: SSL
Type=int,
Description: reward score test evaluation must get higher than to save model (default:20)
Parameter: OPT
Description: shares optimizer choice of Adam, LrSchedAdam or RMSprop (default: Adam)
Parameter: CL
Description: end of life is end of training episode.(default: False)
Parameter: LMD
Description: folder to load trained models from (default: '/modeldata/')
Parameter: SMD
Description: folder to save trained models (default: '/modeldata/')
Parameter: LG
Description: folder to save log (default: '/log/')

Running an Environment, Training and simulating(more below)

1. L(Load)  is set to False because I have no training data for that particular game.
Once trained, a training data is provided, then set L to True.
2. Set SO to True so that it can accumualative learn among all workers.

Interrupt the Kernal to Stop training or stop testing.

Note: Important to run all cells above but don't run everything below this

The cells below are in sections, choose 1 section to run. E.g if you want to train, just run the Training section. or if you want to play pacman, just run the cells in the PacMan Section ( From input to render)

Training Notes

It is important to limit number of worker threads to number of cpu cores available More than one thread per cpu core available is detrimental in training speed and effectiveness

Training Section

Input Dictionary


In [19]:
args = {'LR': 0.0001, "G":0.99, "T":1.00,"W":8,"NS":100,"M":10000,"ENV":'MsPacman-v0',
         "EC":'./settings.json',"SO":True,"L":True,"SSL":20, "OPT":"Adam","CL":False,
         "LMD":'./modeldata/',"SMD":"./modeldata/","LG":'./log/', "seed":42,"config":"Default"
        }

loadarguments()

Run This to Train

Also logs it into a file

In [ ]:
processes = []

p = Process(target=test, args=(args, shared_model, env_conf))
p.start()
processes.append(p)

time.sleep(0.1)
for rank in range(0, args['W']):
    p = Process(
        target=train, args=(rank, args, shared_model, optimizer, env_conf))
    p.start()
    processes.append(p)
for p in processes:
    p.join()

Playing the Atari Games Section

The best part

If it gives tensor errors, just run the cells again and somehow it works the 2nd time. This is because we don't have the full Share optimizer data generated yet

Load model is disabled on default so that you can observe how it learns through iteration, Set L to True if you want to load the trained models and see how well it performs

Playing PacMan (10000 episodes)

Input Parameters


In [20]:
args = {'LR': 0.0001, "G":0.99, "T":1.00,"W":8,"NS":20,"M":1000000,"ENV":'MsPacman-v0',
         "EC":'./settings.json',"SO":True,"L":True,"SSL":20, "OPT":"Adam","CL":False,
         "LMD":'./modeldata/',"SMD":"./modeldata/","LG":'./log/', "seed":42,"config":"MsPacman"
        }


loadarguments()

Run this to Start


In [21]:
test(args, shared_model, env_conf,render=True)


2017-11-29 23:00:51,534 : OPT: Adam
2017-11-29 23:00:51,535 : LG: ./log/
2017-11-29 23:00:51,536 : SMD: ./modeldata/
2017-11-29 23:00:51,537 : ENV: MsPacman-v0
2017-11-29 23:00:51,538 : G: 0.99
2017-11-29 23:00:51,539 : CL: False
2017-11-29 23:00:51,540 : config: MsPacman
2017-11-29 23:00:51,541 : M: 1000000
2017-11-29 23:00:51,542 : L: True
2017-11-29 23:00:51,543 : EC: ./settings.json
2017-11-29 23:00:51,543 : SSL: 20
2017-11-29 23:00:51,544 : seed: 42
2017-11-29 23:00:51,545 : LR: 0.0001
2017-11-29 23:00:51,545 : T: 1.0
2017-11-29 23:00:51,546 : W: 8
2017-11-29 23:00:51,547 : SO: True
2017-11-29 23:00:51,548 : NS: 20
2017-11-29 23:00:51,548 : LMD: ./modeldata/
2017-11-29 23:01:09,454 : Time 00h 00m 17s, episode reward 5040.0, episode length 2059, reward mean 5040.0000
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-21-40b60a3004a2> in <module>()
----> 1 test(args, shared_model, env_conf,render=True)

<ipython-input-15-12a1f9002f83> in test(args, shared_model, env_conf, render)
     51             player.eps_len = 0
     52             state = player.env.reset()
---> 53             time.sleep(60)
     54             player.state = torch.from_numpy(state).float()

KeyboardInterrupt: 

Playing BeamRider (4000 episodes)

Input


In [ ]:
args = {'LR': 0.0001, "G":0.99, "T":1.00,"W":8,"NS":20,"M":4000,"ENV":'BeamRider-v0',
         "EC":'./settings.json',"SO":True,"L":True,"SSL":20, "OPT":"Adam","CL":False,
         "LMD":'./modeldata/',"SMD":"./modeldata/","LG":'./log/', "seed":42,"config":"BeamRider"
        }


loadarguments()

Run this to Start


In [ ]:
test(args, shared_model, env_conf,render=True)

Playing Breakout (3000 episodes)

Input


In [40]:
args = {'LR': 0.0001, "G":0.99, "T":1.00, "S":1,"W":8,"NS":20,"M":3000,"ENV":'Breakout-v0',
         "EC":'./settings.json',"SO":True,"L":True,"SSL":20, "OPT":"Adam","CL":False,
         "LMD":'./modeldata/',"SMD":"./modeldata/","LG":'./log/', "seed":42,"config":"Breakout"
        }


loadarguments()


While copying the parameter named actor_linear.weight, whose dimensions in the model are torch.Size([4, 512]) and whose dimensions in the checkpoint are torch.Size([6, 512]), ...
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-40-3d158bb5c32d> in <module>()
      5 
      6 
----> 7 loadarguments()

<ipython-input-21-aca11d89d648> in loadarguments()
     27         saved_state = torch.load(
     28             '{0}{1}.dat'.format(args['LMD'], args['ENV']))
---> 29         shared_model.load_state_dict(saved_state)
     30     shared_model.share_memory()
     31 

/home/nasdin/anaconda3/envs/py27/lib/python2.7/site-packages/torch/nn/modules/module.pyc in load_state_dict(self, state_dict)
    358                 param = param.data
    359             try:
--> 360                 own_state[name].copy_(param)
    361             except:
    362                 print('While copying the parameter named {}, whose dimensions in the model are'

RuntimeError: inconsistent tensor size, expected tensor [4 x 512] and src [6 x 512] to have the same number of elements, but got 2048 and 3072 elements respectively at /opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/TH/generic/THTensorCopy.c:86

Run this to Start


In [41]:
test(args, shared_model, env_conf,render=True)


2017-11-24 14:52:44,553 : OPT: Adam
2017-11-24 14:52:44,555 : LG: ./log/
2017-11-24 14:52:44,556 : SMD: ./modeldata/
2017-11-24 14:52:44,557 : ENV: Breakout-v0
2017-11-24 14:52:44,559 : G: 0.99
2017-11-24 14:52:44,561 : CL: False
2017-11-24 14:52:44,563 : config: Breakout
2017-11-24 14:52:44,565 : M: 3000
2017-11-24 14:52:44,567 : L: True
2017-11-24 14:52:44,570 : EC: ./settings.json
2017-11-24 14:52:44,572 : SSL: 20
2017-11-24 14:52:44,574 : S: 1
2017-11-24 14:52:44,576 : seed: 42
2017-11-24 14:52:44,578 : LR: 0.0001
2017-11-24 14:52:44,580 : T: 1.0
2017-11-24 14:52:44,583 : W: 8
2017-11-24 14:52:44,585 : SO: True
2017-11-24 14:52:44,587 : NS: 20
2017-11-24 14:52:44,588 : LMD: ./modeldata/
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-41-40b60a3004a2> in <module>()
----> 1 test(args, shared_model, env_conf,render=True)

<ipython-input-18-12a1f9002f83> in test(args, shared_model, env_conf, render)
     27         if render:
     28             env.render()
---> 29         player.action_test()
     30         reward_sum += player.reward
     31 

<ipython-input-14-04b357b58ab4> in action_test(self)
     48             self.cx = Variable(self.cx.data, volatile=True)
     49             self.hx = Variable(self.hx.data, volatile=True)
---> 50         value, logit, (self.hx, self.cx) = self.model((Variable(self.state.unsqueeze(0), volatile=True), (self.hx, self.cx)))
     51         prob = F.softmax(logit)
     52         action = prob.max(1)[1].data.numpy()

/home/nasdin/anaconda3/envs/py27/lib/python2.7/site-packages/torch/nn/modules/module.pyc in __call__(self, *input, **kwargs)
    222         for hook in self._forward_pre_hooks.values():
    223             hook(self, input)
--> 224         result = self.forward(*input, **kwargs)
    225         for hook in self._forward_hooks.values():
    226             hook_result = hook(self, input, result)

<ipython-input-13-bc8f4f957771> in forward(self, inputs)
     32         inputs, (hx, cx) = inputs
     33         x = F.relu(self.maxp1(self.conv1(inputs)))
---> 34         x = F.relu(self.maxp2(self.conv2(x)))
     35         x = F.relu(self.maxp3(self.conv3(x)))
     36         x = F.relu(self.maxp4(self.conv4(x)))

/home/nasdin/anaconda3/envs/py27/lib/python2.7/site-packages/torch/nn/modules/module.pyc in __call__(self, *input, **kwargs)
    222         for hook in self._forward_pre_hooks.values():
    223             hook(self, input)
--> 224         result = self.forward(*input, **kwargs)
    225         for hook in self._forward_hooks.values():
    226             hook_result = hook(self, input, result)

/home/nasdin/anaconda3/envs/py27/lib/python2.7/site-packages/torch/nn/modules/conv.pyc in forward(self, input)
    252     def forward(self, input):
    253         return F.conv2d(input, self.weight, self.bias, self.stride,
--> 254                         self.padding, self.dilation, self.groups)
    255 
    256 

/home/nasdin/anaconda3/envs/py27/lib/python2.7/site-packages/torch/nn/functional.pyc in conv2d(input, weight, bias, stride, padding, dilation, groups)
     50     f = ConvNd(_pair(stride), _pair(padding), _pair(dilation), False,
     51                _pair(0), groups, torch.backends.cudnn.benchmark, torch.backends.cudnn.enabled)
---> 52     return f(input, weight, bias)
     53 
     54 

KeyboardInterrupt: 

Playing SpaceInvader (10000 episodes)

Input


In [22]:
args = {'LR': 0.001, "G":0.99, "T":1.00, "S":1,"W":8,"NS":20,"M":1000000,"ENV":'SpaceInvaders-v0',
         "EC":'./settings.json',"SO":True,"L":True,"SSL":20, "OPT":"Adam","CL":False,
         "LMD":'./modeldata/',"SMD":"./modeldata/","LG":'./log/', "seed":42,"config":"SpaceInvaders"
        }


loadarguments()

Run this to Start


In [23]:
test(args, shared_model, env_conf,render=True)


2017-11-24 16:45:17,533 : OPT: Adam
2017-11-24 16:45:17,534 : LG: ./log/
2017-11-24 16:45:17,535 : SMD: ./modeldata/
2017-11-24 16:45:17,536 : ENV: SpaceInvaders-v0
2017-11-24 16:45:17,537 : G: 0.99
2017-11-24 16:45:17,538 : CL: False
2017-11-24 16:45:17,539 : config: SpaceInvaders
2017-11-24 16:45:17,539 : M: 1000000
2017-11-24 16:45:17,541 : L: True
2017-11-24 16:45:17,542 : EC: ./settings.json
2017-11-24 16:45:17,543 : SSL: 20
2017-11-24 16:45:17,544 : S: 1
2017-11-24 16:45:17,545 : seed: 42
2017-11-24 16:45:17,546 : LR: 0.001
2017-11-24 16:45:17,547 : T: 1.0
2017-11-24 16:45:17,547 : W: 8
2017-11-24 16:45:17,548 : SO: True
2017-11-24 16:45:17,549 : NS: 20
2017-11-24 16:45:17,551 : LMD: ./modeldata/
2017-11-24 16:45:49,833 : Time 00h 00m 32s, episode reward 2895.0, episode length 3753, reward mean 2895.0000
2017-11-24 16:47:10,696 : Time 00h 01m 52s, episode reward 2340.0, episode length 2343, reward mean 2617.5000
2017-11-24 16:49:32,367 : Time 00h 04m 14s, episode reward 11355.0, episode length 8869, reward mean 5530.0000
2017-11-24 16:51:09,094 : Time 00h 05m 51s, episode reward 5270.0, episode length 4445, reward mean 5465.0000
2017-11-24 16:52:45,807 : Time 00h 07m 28s, episode reward 5010.0, episode length 4336, reward mean 5374.0000
2017-11-24 16:54:12,251 : Time 00h 08m 54s, episode reward 3320.0, episode length 3126, reward mean 5031.6667
2017-11-24 16:56:03,930 : Time 00h 10m 46s, episode reward 7445.0, episode length 6303, reward mean 5376.4286
2017-11-24 16:57:41,828 : Time 00h 12m 24s, episode reward 5270.0, episode length 4685, reward mean 5363.1250
2017-11-24 16:59:05,473 : Time 00h 13m 47s, episode reward 2750.0, episode length 2639, reward mean 5072.7778
2017-11-24 17:00:51,077 : Time 00h 15m 33s, episode reward 6550.0, episode length 5415, reward mean 5220.5000
2017-11-24 17:02:27,071 : Time 00h 17m 09s, episode reward 4925.0, episode length 4348, reward mean 5193.6364
2017-11-24 17:03:45,683 : Time 00h 18m 27s, episode reward 2085.0, episode length 2270, reward mean 4934.5833
2017-11-24 17:06:06,522 : Time 00h 20m 48s, episode reward 12540.0, episode length 10000, reward mean 5519.6154
2017-11-24 17:07:46,573 : Time 00h 22m 28s, episode reward 5310.0, episode length 4782, reward mean 5504.6429
2017-11-24 17:09:26,329 : Time 00h 24m 08s, episode reward 5900.0, episode length 4818, reward mean 5531.0000
2017-11-24 17:11:13,932 : Time 00h 25m 56s, episode reward 6985.0, episode length 5752, reward mean 5621.8750
2017-11-24 17:12:33,536 : Time 00h 27m 15s, episode reward 2445.0, episode length 2344, reward mean 5435.0000
2017-11-24 17:14:08,068 : Time 00h 28m 50s, episode reward 4930.0, episode length 4200, reward mean 5406.9444
2017-11-24 17:15:48,918 : Time 00h 30m 31s, episode reward 5885.0, episode length 5002, reward mean 5432.1053
2017-11-24 17:17:28,062 : Time 00h 32m 10s, episode reward 5420.0, episode length 4680, reward mean 5431.5000
2017-11-24 17:18:56,378 : Time 00h 33m 38s, episode reward 3920.0, episode length 3476, reward mean 5359.5238
2017-11-24 17:20:44,839 : Time 00h 35m 27s, episode reward 6870.0, episode length 5807, reward mean 5428.1818
2017-11-24 17:22:14,225 : Time 00h 36m 56s, episode reward 4035.0, episode length 3541, reward mean 5367.6087
2017-11-24 17:23:33,875 : Time 00h 38m 16s, episode reward 2385.0, episode length 2365, reward mean 5243.3333
2017-11-24 17:24:56,261 : Time 00h 39m 38s, episode reward 2520.0, episode length 2723, reward mean 5134.4000
2017-11-24 17:26:03,281 : Time 00h 40m 45s, episode reward 540.0, episode length 812, reward mean 4957.6923
ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

Traceback (most recent call last):
  File "/home/nasdin/anaconda3/envs/py27/lib/python2.7/site-packages/IPython/core/ultratb.py", line 1132, in get_records
    return _fixed_getinnerframes(etb, number_of_lines_of_context, tb_offset)
  File "/home/nasdin/anaconda3/envs/py27/lib/python2.7/site-packages/IPython/core/ultratb.py", line 313, in wrapped
    return f(*args, **kwargs)
  File "/home/nasdin/anaconda3/envs/py27/lib/python2.7/site-packages/IPython/core/ultratb.py", line 358, in _fixed_getinnerframes
    records = fix_frame_records_filenames(inspect.getinnerframes(etb, context))
  File "/home/nasdin/anaconda3/envs/py27/lib/python2.7/inspect.py", line 1049, in getinnerframes
    framelist.append((tb.tb_frame,) + getframeinfo(tb, context))
  File "/home/nasdin/anaconda3/envs/py27/lib/python2.7/inspect.py", line 1013, in getframeinfo
    lines, lnum = findsource(frame)
  File "/home/nasdin/anaconda3/envs/py27/lib/python2.7/site-packages/IPython/core/ultratb.py", line 170, in findsource
    file = getsourcefile(object) or getfile(object)
  File "/home/nasdin/anaconda3/envs/py27/lib/python2.7/inspect.py", line 454, in getsourcefile
    if hasattr(getmodule(object, filename), '__loader__'):
  File "/home/nasdin/anaconda3/envs/py27/lib/python2.7/inspect.py", line 490, in getmodule
    for modname, module in sys.modules.items():
KeyboardInterrupt
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/home/nasdin/anaconda3/envs/py27/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_code(self, code_obj, result)
   2897             if result is not None:
   2898                 result.error_in_exec = sys.exc_info()[1]
-> 2899             self.showtraceback()
   2900         else:
   2901             outflag = 0

/home/nasdin/anaconda3/envs/py27/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in showtraceback(self, exc_tuple, filename, tb_offset, exception_only)
   1824                     except Exception:
   1825                         stb = self.InteractiveTB.structured_traceback(etype,
-> 1826                                             value, tb, tb_offset=tb_offset)
   1827 
   1828                     self._showtraceback(etype, value, stb)

/home/nasdin/anaconda3/envs/py27/lib/python2.7/site-packages/IPython/core/ultratb.pyc in structured_traceback(self, etype, value, tb, tb_offset, number_of_lines_of_context)
   1410         self.tb = tb
   1411         return FormattedTB.structured_traceback(
-> 1412             self, etype, value, tb, tb_offset, number_of_lines_of_context)
   1413 
   1414 

/home/nasdin/anaconda3/envs/py27/lib/python2.7/site-packages/IPython/core/ultratb.pyc in structured_traceback(self, etype, value, tb, tb_offset, number_of_lines_of_context)
   1318             # Verbose modes need a full traceback
   1319             return VerboseTB.structured_traceback(
-> 1320                 self, etype, value, tb, tb_offset, number_of_lines_of_context
   1321             )
   1322         else:

/home/nasdin/anaconda3/envs/py27/lib/python2.7/site-packages/IPython/core/ultratb.pyc in structured_traceback(self, etype, evalue, etb, tb_offset, number_of_lines_of_context)
   1202                 structured_traceback_parts += formatted_exception
   1203         else:
-> 1204             structured_traceback_parts += formatted_exception[0]
   1205 
   1206         return structured_traceback_parts

IndexError: string index out of range

In [ ]: